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Efficiently Sampling Multiplicative Attribute Graphs 
Using a Ball-Dropping Process 



Abstract 

We introduce a novel and efficient sampling 
algorithm for the Multiplicative Attribute 
Graph Model (MAGM - Kim & Lcskovec 
(2010)). Our algorithm is strictly more 
efficient than the algorithm proposed by 
Yun & Vishwanathan (2012), in the sense 
that our method extends the best time com- 
plexity guarantee of their algorithm to a 
larger fraction of parameter space. Both in 
theory and in empirical evaluation on sparse 
graphs, our new algorithm outperforms the 
previous one. 

To design our algorithm, we first define a 
stochastic ball-dropping process (BDP). Al- 
though a special case of this process was in- 
troduced as an efficient approximate sam- 
pling algorithm for the Kronecker Prod- 
uct Graph Model (KPGM - Lcskovec et al. 
(2010)), neither why such an apprximation 
works nor what is the actual distribution this 
process is sampling from has been addressed 
so far to the best of our knowledge. 

Our rigorous treatment of the BDP enables 
us to clarify the rational behind a BDP ap- 
proximation of KPGM, and design an effi- 
cient sampling algorithm for the MAGM. 



1. Introduction 

In this paper we are concerned with statistical mod- 
els on graphs. The scalability of the model's inference 
and sampling algorithm is becoming a critical issue 
especially for sparse graphs, as more and more graph 
data is becoming available. For instance, one can eas- 
ily crawl a graph with millions of nodes in few days 
from Twitter. 

In this regard, the Kronecker Product Graph Model 
(KPGM) of Lcskovec et al. (2010) is particularly at- 



tractive. In contrast to traditional models such 
as Exponential Random Graph Model (ERGM) of 
Robins et al. (2007) or Latent Factor Model of Hoff 
(2009) which cannot scale beyond graphs with thou- 
sands of nodes, both inference in and sampling from a 
KPGM scale to graphs with millions of nodes. 

However, the model has recently been criticized to 
be not very realistic, both in theory (Seshadhri et al., 
2011) and in practice (Moreno & Neville, 2009). This 
is actually not very surprising, as the KPGM is clearly 
under-parametrized; usually only four parameters are 
used to fit a graph with millions of nodes. 

In order to enrich the expressive power of the model 
Kim & Lcskovec (2010) recently proposed a gener- 
alization of KPGM, which is named Multiplica- 
tive Attribute Graph Model (MAGM). The advan- 
tage of MAGM over KPGM has been argued from 
both theoretical (Kim & Lcskovec, 2010) and empir- 
ical (Kim & Lcskovec, 2011) perspectives. 

No matter how attractive such a generalization is in 
terms of modeling, we still need to ask does the new 
model have efficient algorithms for inference and sam- 
pling? The inference part of this question was studied 
by Kim & Lcskovec (2011), while sampling part was 
partially addressed by Yun & Vishwanathan (2012). 
In this paper, we further investigate the sampling is- 
sue. 

It is straightforward to sample a graph from a MAGM 



e 



time, where n is the number of nodes. Of 



Preliminary work. 



course, such a naive algorithm does not scale to large 
graphs. Therefore, Yun & Vishwanathan (2012) sug- 
gested an algorithm which first samples O (jlog2 n)^^ 
graphs from a KPGM and quilts relevant parts of the 
sampled graphs together to generate a single sample 
from the MAGM. Since approximate sampling from 
KPGM takes expected O {ck log2 n) time, where is 
the expected number of edges in the KPGM, the quilt- 
ing algorithm runs in O ^(log2 n)'^ ck^ time with high 
probability. The unsatisfactory aspect of the approach 
of Yun & Vishwanathan (2012), however, is that the 
complexity bound holds only when certain technical 
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conditions are met. 

On the other hand, for the most commonly used 
parameter settings (see Section 4.5) our algorithm 

runs in O ^(logj n)'^ (ex + ca/)^ time with high prob- 
ability, where cm is the expected number of edges 
in the MAGM. When the technical conditions of 
Yun & Vishwanathan (2012) are met, then cm = gk- 
Therefore, our method extends the best time complex- 
ity of Yun & Vishwanathan (2012) to a larger fraction 
of parameter space. Not only is our algorithm theoreti- 
cally more interesting, we also show that it empirically 
outperforms the previous algorithm in sampling sparse 
graphs. 

To design our algorithm, we first define a stochas- 
tic ball-dropping process (BDP) (Chakrabarti et al. 
(2004), GroeretaL (2010) and Gleich & Owen 
(To appear.)). Although a special case of BDP was 
already introduced as an approximate sampling algo- 
rithm for KPGM (Lcskovec et al., 2010), to the best 
of our knowledge neither why such an approximation 
works nor what is the actual distribution this process 
is sampling from has been addressed so far. 

Our rigorous treatment of these problems enables us 
to clarify the rational behind a BDP approximation of 
KPGM (Section 3), and design an efficient sampling 
algorithm for MAGM (Section 4). We let BDP to pro- 
pose candidate edges, and then reject some of them 
with certain probability to match the actual MAGM. 
This is the classic accept-reject sampling scheme for 
sampling distributions. The main technical challenge 
which we address in this paper is to show that the pro- 
posal distribution compactly bounds the target distri- 
bution, so that we can guarantee the efficiency of the 
algorithm. 

2. Notation and Preliminaries 

We use upper-case letters for matrices {e.g., A). Sets 
are denoted by upper-case calligraphic letters (e.g., £). 
We use Greek symbols for parameters {e.g., fi), and 
integers are denoted in lower-case {e.g., a,b,i,j). 

A directed graph is an ordered set (V,£), where V is 
the set of nodes V — {1,2,..., n}, and £ is the set of 
edges £ C VxV. We say that there is an edge from 
node i to j when {i,j) € £. Furthermore, for each 
edge {i,j) G £, i and j are called source and target 
node of the edge, respectively. Note that although we 
mainly discuss directed graphs in this paper, most of 
our ideas can be straightforwardly applied to the case 
of undirected graphs. 

It is convenient to describe a graph in terms of its n x n 



adjacency matrix A where the (i, j)-th entry Aij of A 
denotes the number of edges from node i to j. When 
there exists at most one edge between every {i,j) pair, 
i.e., Aij < 1 for all then we call it a simple graph. 
On the other hand if multiple edges are allowed then it 
is called a multi-gv a,ph. In either case, \£\, the number 
of edges in the graph, is equal to J27j=i ^v- 

The Kronecker multiplication of matrices is defined as 
follows (Bernstein, 2005). 



Definition 1 Given real matrices X G 



Y e 



the Kronecker product X (g) F G 



and 

xmq 



XiiY X12Y 



XlmY 



XnlY Xn2Y 



The k-th Kroneck 



er power 



is ®i=iX. 



2.1. Kronecker Product Graph Model 
(KPGM) 

The Kronecker Product Graph Model (KPGM) of 
Leskovec et al. (2010) is usually parametrized by a 
2x2 initiator matrix 



e := 



^00 

9io 



'01 

9ii 



(1) 



with each Oij G [0, 1], and additional size parameter 
d G Z+ . Using Kronecker multiplication, we construct 
a 2'' X 2'* matrix P from 9: 



r := e[''i = e (g) 6 ( 



e. 



(2) 



F is called an edge probability matrix, because under 
the KPGM the probability of observing an edge from 
node i to node j is simply Tij (see Figure 1). From an 
adjacency matrix point of view 6cLc1i -^ij is an indepen- 
dent Bernoulli random variable with P [Aij = 1] = Tij . 

Note that one can make the model more general by 
using multiple initiator matrices O^^^ 8^^^ . . . , 9^''^ 
rather than a single matrix. In this case, the definition 
of edge probability matrix F is modified to 



(3) 



In this paper we will adopt the more general setting 
(3). For notational convenience, we stack these initia- 
tor matrices to form the parameter array 



9 



(9(1), 9(2),..., 9('') 



(4) 
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Also, e[^^ denotes (a + 1, 6+l)-th entry of 9^=). Given 
these parameters, the expected number of edges ck of 
KPGM can be calculated using 



(5) 



fc=l \0<a,6<l 



2.2 



Multiplicative Attribute Graph Model 
(MAGM) 



An alternative way to view KPGM is as follows: as- 
sociate the i-th node with a bit-vector b{i) of length d 
such that 5fc(«) is the k-th digit of integer (« — 1) in its 
binary representation. Then one can verify that the 
(i,j)-th entry of the edge probability matrix F in (3) 
can be written as 



k=l 



Jk) 



(6) 



Under this interpretation, one may consider hk{i) — 1 
(resp. — 0) as denoting the presence (resp. ab- 

sence) of the k-th attribute in node i. The factor 

(k) 

hk(i) bkU) denotes the probability of an edge between 
nodes i and j based on the value of their k-th attribute. 
The attributes are assumed independent, and therefore 
the overall probability of an edge between i and j is 
just the product of bk(j)'^' 

The Muhiphcative Attribute Graph Model (MAGM) 
of Kim & Lcskovcc (2010) is also obtained by associat- 
ing a bit-vector f(i) with a node i. However, /(i) need 
not be the binary representation of (i — 1) as was the 
case in the KPGM. In fact, the number of nodes n need 
not even be equal to 2'*. We simply assume that fkii) is 
a Bernoulli random variable with P [fk{i) = 1] = m'-'^-'- 
In addition to 8 defined in (4), the model now has 
additional parameters fl := [fj,'^^\ fj,^^\ . . . , ^''''), and 
the (i, j)-th entry of the edge probability matrix ^I' is 
written as 



*J 11 fk(i) Jk(3) 
k=l 



(7) 



The expected number of edges under this model will 
be denoted cm, and can be calculated using 



d 



k=i yo<o,&<i 

Note that when /i^^) = ^(^^ ^ ... = = 0.5, we 
have Cm — ex (see Figure 4). 



3. Ball-Dropping Process (BDP) 

A naive but exact method of sampling from KPGM 
is to generate every entry of adjacency matrix A indi- 
vidually. Of course, such an approach requires Q (n^) 
computation and does not scale to large graphs. Al- 
ternatively, Lcskovcc ct al. (2010) suggest the follow- 
ing stochastic process as an approximate but efficient 
sampling algorithm (see Figure 1): 

• First, sample the number of edges \£\ from a Pois- 
son distribution with parameter ck^- 

• The problem of sampling each individual edge is 
then converted to the problem of locating the 
position of a "ball" which will be dropped on 
a 2^* X 2^^ grid. The probability of the ball be- 
ing located at coordinate is proportional 
to Tij. This problem can be solved in O (d) 
time by employing a divide-and-conquer strategy 
(Leskovec et al., 2010). See Figure 1 for a graph- 
ical illustration, and Algorithm 1 in Appendix B 
for the pseudo-code. 

If a graph is sampled from the above process, how- 
ever, there is a nonzero probability that the same pair 
of nodes is sampled multiple times. Therefore, the 
process generates multi-graphs while the sample space 
of KPGM is simple graphs. The above generative pro- 
cess is called a ball-dropping process (BDP), in order to 
distinguish it from the KPGM distribution. Of course, 
the two are closely related. We show the following the- 
orem which characterizes the distribution of BDP and 
clarifies the connection between the two. 

Theorem 2 (Distribution of BDP) // a multi- 
graph G is sampled from a BDP with parameters 
and d, then Aij follows an independent Poisson 
distribution with rate parameter Tij defined by (3). 

Proof See Appendix A.l. ■ 

Recall that in the KPGM, each Aij is drawn from an 
independent Bernoulli distribution, instead of a Pois- 
son distribution. When the expectation of Bernoulli 
distribution is close to zero, it is well-known that 
the Poisson distribution is a very good approximation 
to the Bernoulli distribution (see e.g.. Chapter 1.8, 
DasGupta (2011)). To elaborate this point, suppose 
that a random variable X follows a Poisson distribu- 
tion with rate parameter p, while Y follows a Bernoulli 



^Originally Leskovec et al. (2010) used the normal dis- 
tribution, but Poisson is a very close approximation to 
the normal distribution especially when the number of ex- 
pected edges is a large number (Chapter 1.18, DasGupta 
(2011)). 
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Figure 1. (Best viewed in color) (a) Edge probability ma- 
tTix P of a KPGM with parameter 9 = (0.4, 0.7; 0.7, 0.9) 
and d = 3. Darker cells imply a higher probability of ob- 
serving an edge, (b) To locate the position of an edge, 
the matrix is divided into four quadrants, and one of them 
is chosen randomly with probabihty proportional to the 
weight given by the O matrix. Here, the fourth quadrant 
is chosen, (c) and (d) The above process continues recur- 
sively and finally a location in the 8x8 grid is determined 
for placing an edge. Here nodes 8 and 6 are connected. 



distribution with the same parameter p. Then, using 
the Taylor expansion 



'[X = 0] 



p [r = 0] 



In practice we are interested in large sparse graphs, 
tlierefore most Tij values are close to zero, and the 
Poisson distribution provides a good approximation. 
In fact, this property of the Poisson distribution is 
often used in statistical modeling of sparse graphs to 
make both analysis tractable and computation more 
efficient (see e.g., Karrcr & Newman (2011)). 



3.1. Two Observations 

Note that exp(— p) > 1 — p and consequently the 
probabihty of an edge not being sampled is higher 
in the BDP than in the KPGM. Consequently, the 
BDP generates sparser graphs than exact sampling 
from KPGM. Leskovec et al. (2010) observed this and 
recommend sampling extra edges to compensate for 
tliis effect. Our analysis shows why this phenomenon 
occurs. 

As the BDP is characterized by a Poisson distribution 
instead of the Bernoulli, it only requires non-negativity 

of its parameters. Therefore, for a BDP we do not need 

(k) 



bounded by 1. This extra bit of generality will be 
found useful in the next section. 

4. Sampling Algorithm 

In the MAGM, each entry Aij of the adjacency matrix 
A follows a Bernoulli distribution with parameter . 
To efhciently sample graphs from the model, again we 
approximate Aij by a Poisson distribution with the 
same parameter 5'.^ , as discussed in Section 3. 

A close examination of (6) and (7) reveals that KPGM 
and MAGM are very related. The only difference is 
that in the case of KPGM the i-th node is mapped to 
the bit vector corresponding to (? — 1) while in the case 
of MAGM it is mapped to an integer Ci (not necessarily 
(i — 1)) whose bit vector representation is f{i). We will 
call Ci the color of node i in the sequel^. The concept 
of color clarifies the connection between KPGM and 
MAGM through the following equality 



(9) 



4.1. Problem Transformation 

Let Vc be the set of nodes with color < c < n — 1 



{i : Ci^c} . 



(10) 



Instead of sampling the adjacency matrix A directly, 
we will first generate another matrix B, with Bcc' de- 
fined as 



E E 



(11) 



In other words, Bcc' is the number of edges from nodes 
with color c to nodes with color c'. It is easy to verify 
that each Bed is a sum of Poisson random variables 
and hence also follows Poisson distribution (Chapter 
13, DasGupta (2011)). Let Acc' be the rate parame- 
ter of the Poisson distribution in Bed, which can be 
calculated from (9) and (11) 



A, 



|vJ-|Ve'l-r, 



(12) 



Given matrix B, it is easy to sample the adjacency 
matrix A. Uniformly sampling Scc'-number of 
pairs in Vc x Vd for each nonzero entry Bed of B, 
and incrementing Aij by 1 for each sampled pair will 
sample A conditioned on B. An argument similar to 
the proof of Theorem 2 can be used to show the validity 
of such an operation. 



to enforce the constraint that every 9^^^ parameter is better 



^Yun & Vishwanathan (2012) call it attribute configu- 
ration, but in our setting we think color conveys the idea 
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Let m be the maximum number of nodes with the same 
color 



m := max iVc 

0<c<n-l 



(14) 



(a) A 



(b) A' 



(c) A A' 



Using the notation in (4), if one generates a random 
matrix B' from BDP with the parameter with each 
component O'C^) defined as 



Figure 2. (a) Poisson parameter matrix A of target distri- 
bution B. (b) Parameter matrix A' of proposal distribution 
B' . Each entry of A' must be higher than the corresponding 
entry in A for B' to be a valid proposal, (c) The acceptance 
ratio is obtained by Hadamard (element-wise) division of 
A by A'. The acceptance ratio is high when the gap be- 
tween A and A' is small. In all three figures darker cells 
imply higher values and a white cell denotes a zero value. 
Parameters 9 = (0.7, 0.85; 0.85, 0.9), d = 3 and = 0.7 
was used for these plots. 

That said, now the question is how to efhciently sample 
B. In Section 4.4, we will efficiently construct another 
n X n random matrix B' , with B'^^, following an inde- 
pendent Poisson distribution with parameter A^^,,. B' 



aik) nik) 

^00 "^01 

nik) nik) 

'lO "11 



(15) 



then, by calculation we have A^^, = rri^Tcc'- From 
definition (12) and (14), it is obvious that (13) holds 



Acc' - |Ve| • \Vc'\ ■ r,,, < m2 . Tec' = A^, 

and hence B' is a valid proposal for B. 



(16) 



We now investigate the time complexity of sampling 
B' . Since BDP with parameter generates ck num- 
ber of edges in expectation, B' will generate ■ bk 
edges in expectation because its BDP parameter is 
Q = m^/"' 8. As sampling each edge takes 0{d) time. 



will bound B, in the sense that for any c and c' (see the overall time complexity is O (d • • ck) 



Figure 2), 



Ar_r' < A' 



(13) 



If Ai 



(1) 



M) 



0.5 and n 



For each nonzero value of B'^^, , sampling from the bino- 
mial distribution of size B'^^, with parameter will 

generate a valid Bed- As a filtered Poisson process 
is still a Poisson process with an adjusted parameter 
value, this step remains valid (Chapter 13.3, DasGupta 
(2011)). 

To summarize, we will first generate B' , use B' to sam- 
ple B, and then convert B to A. The time complexity 
of the algorithm is dominated by the generation of B' . 
See Algorithm 2 of Appendix B for the pseudo-code. 

Note that the relation between B and B' is similar 
to that between target and proposal distribution in 
accept-reject sampling. While B is the target distri- 
bution we want to sample, we first generate a proposal 
B' and correct each entry B'^^, using acceptance ratio 
^7^. Just like it is important to find a good proposal 
distribution which compactly bounds the target distri- 
bution in accept-reject sampling, we need B' which 
compactly bounds B. The remainder of this section is 
devoted to show how this can be done. 

4.2. Simple Illustrative Proposal 

To illustrate the idea behind our construction of B\ let 
us first construct a simple but non-optimal proposal. 



Yun & Vishwanathan (2012) showed that m < log2 n 
with high probability. Therefore, the overall time com- 
plexity of sampling is O • (log2 n)^ • ck^ . 

Roughly speaking, the quilting algorithm of 
Yun & Vishwanathan (2012) always uses the same 
B' irrespective of fi^'^^^a. When /i^'^^'s are not exactly 
equal to 0.5, m is no longer bounded by log2 n. To 
resolve this problem Yun & Vishwanathan (2012) 
suggest some heuristics. Instead, we construct a more 
careful proposal which adapts to values of /x'-'^-' . 

4.3. Partitioning Colors 

To develop a better proposal B' , we define quantities 
similar to m but bounded by log2 n with high prob- 
ability for general fi^'''> values. To do this, we first 
partition colors into a set of frequent colors J- and in- 
frequent colors I 



^:={c:E[|Vc|] >1}, 
Z:={c:E[|Vc|<l]} = 



{0, 



1}\^. 



(17) 
(18) 



The rational behind this partitioning is as follows: 
When E[|Vc|] > 1, the variance is smaller than that 
of the mean thus yar[|Vc|] < E[|Vc|]. On the other 
hand, when E|T4| < 1, then the variance is greater 
than that of the mean thus yar[|Vc|] > E[|Vc|]. 
Therefore, the frequencies of colors in and those 
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A' 



A^^ 



Figure 3. Decomposition of A' into Ai,x, Af,i and 

Ai,;r . Parameters 9 = (0.7, 0.85; 0.85, 0.9), d = 3 and 
H = 0.7 was used. It can be seen that the values of A'^'^ 
are concentrated on highly probable pairs, while the values 
of A"^"^ are relatively spread out. 

in X behave very differently, and we need to account 
for this. We define 





1) 0.2 (1.4 U.6 U.fi 1 



I) 0.2 0.4 0.6 0.8 1 



max ■ 

cer E \\Vc 



mr :— max iVc 



(19) 



Theorem 3 (Bound of Color Frequencies) With 
high probability, mjr. mx < log2 n. 

Proof See Appendix A. ■ 



4.4. Construction of Proposal Distribution 

Finally, we construct the proposal distribution. The 
matrix B' is the sum of four different BDP matrices 



Figure 4. Values of em, ek, ckm and cmk when d = 1 and 
e = (0.15, 0.7; 0.7, 0.85) or 9 = (0.35, 0.52; 0.52, 0.95) was 
used. One can see that ckm and cmk are between bm and 
ck, but for general 9 it may not be the case. 



The following theorem proves that B' is a valid pro- 
posal. That is, B' bounds B in the sense discussed in 
Section 4.1, and therefore given B' we can sample B. 

Theorem 4 (Validity of Proposal) For any c and 

c! such that < c, c' < n — 1, we have 



(22) 



Proof See Appendix A. 3. ■ 

Also see Algorithm 2 of Appendix B for the pseudo- 
code of the overall algorithm. 



B' = + + + B^^^l (20) 4.5. Time Complexity 



Intuitively, B'-^^'^ concentrates on covering entries of 
B between frequent colors, while S^-^-^) spreads out its 
parameters to ensure that every entry of B is properly 
covered. On the other hand, i?^-^-^^ and B^"^^^ cov- 
ers entries between a frequent color and other colors. 
Figure 3 visualizes the effect of each component. 

For A,B ^ {J-", I}, let o''"^^'' and d be parameters of 
BDP B^-^'^\ Following notation in (4) again, the k-th 
component of these matrices are defined as 



As it takes 9 (d) time to generate each edge in BDP, 
let us calculate the expected number of edges B' will 
generate. The following quantities similar to (5) and 
(8) will be found useful 



eMK = " • n (1 - Ai) 



1-1 nik) 



k=l \()<a.b<l 



i-fc/)(fe) 



(23) 



(24) 



fe=l \0<a,6<l 



(1 



1-M 



,(fe) 

00 



(1 



^11 



(1 



oik) 
-^01 



'oi 



{k)n{k) 



q(fc) 
^00 
qik) 
^10 



q(k) 
^01 
qik) 
^11 



^11 



(21) 



In general, cmk and ckm are not necessarily lower or 
upper bounded by cm or ck- However, for many of 
known parameter values for KPGM and MAGM, espe- 
cially those considered in Kim & Leskovec (2010) and 
Yun & Vishwanathan (2012), we empirically observe 
that they are indeed between cm and bk 

mm{eM,eK} < eMK-,eKM < max{eM,eif}. (25) 

see Figure 4 for a graphical illustration. 

From straightforward calculation, B'^-'^ ^\ B''-^^\ 
B^^-^^ and B^^^) generates m^rCA/, mjrmxeMK, 
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mimjreKM and rn^eK edges in expectation, respee- 
tively. By Theorem 3, the overah time complexity is 

O {d ■ (log2 nf ■ {ex + ekm + cmk + cm)^ with high 
probability. When (25) holds, it can be further sim- 
phfied to O • (logj ■ {ck + SAifj . Note that d is 
also usually chosen to be d < log2 n. This implies that 
the time complexity of the whole algorithm is almost 
linear in the number of expected edges in MAGM and 
an equivalent KPGM. 

Note that the time complexity of algorithm in 
Yun & Vishwanathan (2012) is at least n,{d ■ ex) and 
attains the best guarantee of 0{d{\og2'n) ck) when 
sm = sk- When (25) holds, therefore, our algorithm 
is at least as efficient as their algorithm. 

4.6. Combining two Algorithms 

Note that one can combine our algorithm and the algo- 
rithm of Yun & Vishwanathan (2012) to get improved 
performance. For both algorithms, it only takes O {nd) 
time to estimate the expected running time. Thus one 
can always select the best algorithm for a given set of 
parameter values. 

5. Experiments 

We empirically evaluated the efficiency and scalabil- 
ity of our sampling algorithm. Our experiments are 
designed to answer the following questions: 1) How 
does our algorithm scale as a function of e a/ , the ex- 
pected number of edges in the graph? 2) What is the 
advantage of using our algorithm compared to that of 
Yun & Vishwanathan (2012)? 

Our algorithm is implemented in C++ and will be made 
available for download from http: //anonymous. For 
quilting algorithm, we used the original implementa- 
tion of Yun & Vishwanathan (2012) which is also writ- 
ten in C++ and compiled with the same options. All 
experiments are run on a machine with a 2.1 GHz pro- 
cessor running Linux. 

Following Yun & Vishwanathan (2012), we uniformly 
set n = 2'^, and used the same 6 matrices and /i val- 
ues at aU levels: i.e., 6 = O^^) = O^^) = . . . = e^'') 
and fi = ^S^^ — ■■■ — Furthermore, we ex- 

perimented with the following O matrices used by 
Kim & Leskovec (2010) and Moreno & Neville (2009) 
to model real world graphs: 



(a) 11 = 0.2 



Oi = 



0.15 
0.7 



0.7 

0.85 



and 02 = 



0.35 
0.52 



0.52 
0.95 



Figure 5 shows the running time of our algorithm vs 
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Figure 5. Comparison of running time (in sec- 
onds) of our algorithm vs the quilting algorithm of 
Yun & Vishwanathan (2012) as a function of expected 
number of edges em for two different values of and five 
values of fi. 
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Figure 6. Comparison of running time (in sec- 
onds) of our algorithm vs the quilting algorithm of 
Yun & Vishwanathan (2012) as a function of n for two 
different values of O and n = 2^^. 



Yun & Vishwanathan (2012) as a function of expected 
number of edges bm- Each experiment was repeated 
ten times to obtain error bars. As our algorithm has 
theoretical time complexity guarantee, irrespective of 
fi the running time is almost linear in e^/. On the other 
hand, Yun & Vishwanathan (2012) shows superb per- 
formance when dealing with relatively dense graphs 
(/Lt > 0.5), but when dealing with sparser graphs 
(/i < 0.5) our algorithm outperforms. 

Figure 6 shows the dependence of running time on fi 
more clearly. In our parameter setting, the number 
of expected edges is an increasing function of fi (see 
Figure 4 for d = 1). As the time complexity of our 
algorithm depends on cm, the running time of our al- 
gorithm increases accordingly as ^ increases. In the 
case of quilting algorithm, however, the running time 
is almost symmetric with respect to /i = 0.5. Thus, 
when < 0.5 it is relatively inefficient, compared to 
when /i > 0.5. 

6. Conclusion 

We introduced a novel and efficient sampling algo- 
rithm for the MAGM. The run-time of our algorithm 
depends on ex and cm- For sparse graphs, which 
are primarily of interest in applications, the value of 
Cm is well bounded, and our method is able to out- 
perform the quilting algorithm. However, when /i is 



greater than 0.5, MAGM produces dense graphs. In 
this case the heuristics of Yun & Vishwanathan (2012) 
work well in practice. One can combine the two algo- 
rithms to produce a fast hybrid algorithm. Theoretical 
investigation of the quilting algorithm and its heuris- 
tics may provide more insights into improving both 
algorithms. 

For the parameter settings we studied the correspond- 
ing KPGM graphs are sparse and can be sampled effi- 
ciently. However, for some values of Q the correspond- 
ing KPGM graphs can become dense and difficult to 
sample. Removing dependency of time complexity on 
ck remains an open question, and a focus of our future 
research. 
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A. Technical Proofs (not included in 8 
page limit) 

A.l. Proof of Theorem 2 

Proof By conditioning on the number of edges \£\, 
the probability mass function can be written as 



[A]=F[\£\]-P[A\\£\]^ 



(26) 



Recall that the marginal distribution of \£\ follows 
Poisson distribution with rate parameter en- Using 
(5) and the definition of a Poisson probability mass 
function, 



P[\£\ 



exp 




(27) 



On the other hand, the conditional distribution of A 
given \£\ is defined by the multinomial distribution, 
and its probability mass function is given by 



'[A\\£\]^ 




En 



(28) 



where (^^ ^^j'^L.^^ ) is the multinomial coefficient. 
By definition := ^^'^ after some simple 

algebra, we have 



(29) 



By the factorization theorem, every Aij is independent 

i 

distribution with rate parameter Tij. 



of each other. Furthermore, Aij follows a Poisson 



A.2. Proof of Theorem 3 

Proof For c G 7^, we apply the multiplicative 
form of Hoeffding-Chernoff inequality (Chapter 35.1, 
DasGupta (2011)) to get 



|Vc| >log2n-E[|Vc|]] < 



exp (log2 n - 1) 
, (log2") 



< 



exp(log2n - 1) 
. (log^n)'"^^" , 



E[|Ve|] 



(30) 



(31) 
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for large enough n. Then using the union bound 



B. Pseudo-Code of Algorithms 



U |V,| >log2n.E[|Vc 



< 5]F[|Vc| >log2n.E 
(32) 



1^ 



;orithm 1 Description of Ball-Dropping Process 



(log2 n) 



log2 n 



(33) 



as n ^ oo. For c G X, on the other hand, we apply the 
additive form of Hoeffding-Chernoff inequahty: 



\Vc\ > log2n] < 



IE[|V.|] 



< n 



log2« 

1 1 — log2 n/n 
log2 n 1 ~ 1/n 



l-E[|Ve|]/n 

1 - log2 n/ n 

(34) 



(35) 



1 — loo 



nction BDP 
Input: parameter Q 
Output: set of edges £ 

eK ^ nti {O^o^ + + 0[l^ 
Generate X Poisson^ex)- 
for a; = 1 to X do 

S start start ^ 1 
2 ^ Send: ^end ^ ^ 

for A; 1 to d do 

Sample (a, b) oc O^^^j 

q , c I an 

iJstart ^ >Jstart ~r 2k • 

Ti , Ti j_ bn 

start ^ start ~r 2^ ■ 

^end ~ ^end 



Using union bound again, 

U |Vc| >log2n 



2k 
(l-b)rt 
2k 



.cel 



(36) 



end for 

# We have S start — Send, Tstart — 
S S ^{{Sstart, Tstart)} 

end for 



A. 3. Proof of Theorem 4 

Proof Let A^-^'^^ be rate parameter matrix of B^^'^\ 
From the definition (3), 



yY(AB) ._ QiA,B)il) ^ 0(A8)(2) (g, . . . e(AB)(<i)^ 



(37) 



From (21) and (37), it is easy to verify that 



A^ 



(II) 



A^ , 



{m:rf ■E[\Ve\]-E[\Vc'\]-re 
mjr • TOi • E [|Vc|] • Fee' 

: TOI • mjr ■ E [\Vc'\] ■ Tec' 
{mif Tee' 



Using (12) and (19) obtains 



Ker' < a; 



{AB) 



< AL,, 



for c £ T,c' £ T, 

for c e J", c' e I, 

for c el,c' e 

for c £l,c' el . 



(38) 



for any A,Be {J", I}, c e A, and c' e B. 
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^^^'^ Algorithm 2 BDP Sampler ol MAGAI ^^''^ 

1118 — ^-^7 1173 

^"1^29 Input: parameters 0, jl 1174 

112Q Output: set of edges £ 1175 

1121 ^^'^ 1176 

1122 for ^ in {J", T} do -^-^jj 

1123 for^in{^,Z}do ^^78 

1124 for (c, c') in BDP ' j do 1179 

1125 if c G ^ and c' e B then 1180 

1126 Generate u ~ Uniform{0, 1). 1181 

1127 ifu<^ then 1182 

1128 " ' 1183 
]^-|^29 Sample i uniformly from Vc- 1184 
ll^Q Sample j uniformly from Vc' ■ ]^]^85 

1131 £^£u{{i,j)}. 1186 

1132 end if 1187 

1133 en'* if 1188 

1134 end for 118g 
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